🌻 Comparing groups and testing the differences#

22 Sep 2025

Summary#

How can we use a factors table to summarise a causal map, compare groups (districts, gender, questionnaire sections, etc.), and decide which differences are worth attention?

This extension is essentially a second-stage transformation:

  1. We start with a links table (one row per coded causal claim, with a source_id and any source metadata).
  2. From that we derive a factors table (one row per factor label), by aggregating the links table:
  3. counts of citations (how many link rows mention the factor as cause or effect)
  4. counts of sources (how many distinct sources mention it)
  5. optionally in/out splits (as cause vs as effect), and derived scores like “outcomeness” or influence/importance.
  6. This extension then transforms the factors table again by adding group breakdown columns, and (optionally) a statistical test of whether observed group differences are larger than you would expect by chance.

The factors table is already an interpretation layer#

The factors table is not “raw data”: it is a summary view derived from links.

That makes the factors table useful for:

Semantics of the main counts (and why totals are tricky)#

Two common evidence units are:

Some important consequences:

Totals and normalisations (what they mean)#

When you add totals or percentages, you are choosing a baseline:

These are not cosmetic: they encode a stance on whether you care about absolute volume or relative emphasis.

The core idea: breakdown columns#

Let \(G\) be a categorical group variable attached to sources (e.g. district, gender, section). For each factor \(f\), we create extra columns that “break out” the factor’s frequency by the levels of \(G\).

Two natural “units of analysis” exist:

These extra columns let you ask questions like:

Optional normalisation: percent-of-baseline view#

Raw counts can be misleading when groups differ in overall verbosity (e.g. one group has more sources, or makes more links per source).

A complementary view is to show each cell as a percent of that group’s total across all factors (a “percent-of-baseline” normalisation). This re-expresses the question as:

“Is this factor unusually prominent given how much this group mentions factors overall?”

Optional inference: significance testing#

When you choose exactly one grouping variable \(G\), you can attach a statistical test to each factor that asks whether the distribution across group levels departs from expectation.

Intuition (chi-squared style):

Even if group A and group B differ in total mentions, is factor \(f\) over-represented in one group relative to that baseline?

For numeric-like groupings (e.g. ordered age bands), an ordinal trend interpretation can be more powerful than treating levels as unordered categories.

Why this matters#

Group comparisons help you move from “what is mentioned most often?” to “what differs between contexts?”—which is often the analytic point of multi-source causal mapping: you are comparing perspectives, contexts, or sub-populations, not estimating causal effect sizes.